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(54) A uri rewriting pseudo proxy server 

(57) A method for real time remapping of access to 
a selected remote domain in an interconnected compu- 
ter system network comprising the steps defining a 
pseudo proxy server and translating in the pseudo proxy 
server a remote record identifia corresponding to the 
remote domain to a remapped record identifier corre- 



FIG. 1 



spending to the local domain. In a further enhancement 
the method conrprises the additional step of determin- 
ing rf a selected record identifier is a selected remapped 
record identifier. 
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Description 

FIELD OF THE INVENTION 

This invention relates to the field of interconnected 
computers, and more particularly to the field of format- 
ted data distributed on interconnected computers. 

BACKGROUND OF THE INVENTION 

Because the Internet evolved from the ARPAnet, a 
research experiment that supported the exchange of 
data between government contractors and (often aca- 
demic) researchers, an on-line culture developed that Is 
alien to the corporate business world. The Internet was 
not designed to make commercialization easy. 

Domain names direct where e-mail is sent, files are 
found, and corrputer resources are located. They are 
used when accessing information on the WWW or con- 
necting to other computers through Telenet. Internet 
users enter the domain name, which is automatically 
converted to the Internet Protocol address by the 
Domain Name System (DNS). The DNS is a service 
provided by TCP/IP that translates the symbolic name 
into an IP address by looking up the domain name in a 
database. 

The World Wide Web (WWW) is one of the newest 
Internet services. The WWW allows a user to access a 
universe of information which combines text, audio, 
graphics and animation within a hypermedia document. 
Links are contained within a WWW document which 
allows simple and rapid access to related documents. 
The WWW was developed to provide researchers with a 
system that would enable them to quickly access all 
types of information with a common interface, removing 
the necessity to execute a variety of numerous steps to 
access the information. During 1991, the WWW was 
released for general usage with access to hypertext and 
UseNet news articles. Interfaces to WAIS. anonymous 
FTP. Telnet and Gopher were added. By the end of 1993 
WWW brcwsers with easy to use interfaces had been 
developed for many different computer systems. 

With HyperText Markip Language (HTML) based 
pages, such as the WWW, the pages of information 
contain pointers to other pages. The pointers, are links 
which are encoded with Uniform Resource Locators 
(URLs). The URL contains a transmission protocol, 
such as HyperText Transfer Protocol (HTTP), a domain 
name of the target computer system, and a page identi- 
fier. 

Accordingly, with the commercialization of the Inter- 
net through advertising, charging for access to informa- 
tion, and other schemes there is a need for an Internet 
Service Provider (ISP) to record all of the interactions 
that their customers have with HTML based content 

SUMMARY OF THE INVENTION 

In an interconnected computer system network 



there is provided a method for real time remapping of a 
remote domain to a local domain. The method compris- 
ing the steps defining a pseudo proxy server and trans- 
lating in the pseudo proxy server a remote record 

5 identifier corresponding to the remote domain to a rem- 
a:ppe6 record Identifier corresponding to the local 
domain. In a further enhancement the method com- 
prises the additional step of determining If a selected 
record identrf ier is a selected remapped record identi- 

10 tier. 

In an enfiancement of the present invention, there 
is provided a method of providing pseudo proxy access 
for tracking and controlling access to remote record 
identifiers. The method comprising the steps of: provid- 
15 ing a first data set having rewritten record identifiers for 
a remote record identifier to a local user; responding to 
a request from the local user for a selected record iden- 
tifier; determining if the selected record identifier is a 
rewritten record identifier; determining an actual record 
20 identifier for the rewritten record identifier; and request- 
ing a second data set corresponding to the actual 
record identifier from said interconnected computer sys- 
tem network. 

In an another enhancem^ of the present invention 
25 the first data set comprises a HyperText Markup Lan- 
guage based data set 

In a further enhancement of the present invention 
tiie remote record identifier comprises a uniform record 
locator. 

30 In yet a f urtiier enhancement the present invention 
comprises tiie additional steps of determining tiiat a 
record identifier is remote and rewriting tiie remote 
record identifier. 

Determining that the record identifier is remote in 

35 an enhancement of the present invention comprises the 
step of scanning a domain name of the actual record 
identifier and comparing tiie domain name to a local 
domain name wherein tiie record identifier is remote if 
tiie domain name is different tiian the local domain 

40 name. 

In yet furtiier enhancements of the present inven- 
tion the step of determining an actual record identifier 
for the rewritten record identifier comprises looking up 
tiie actual record identifier by a predetermined index, by 
45 a hashing table, by addressing a memory location, by 
accessing an inode of a disk file, or by accessing a disk 
file by a file name. 

BRIEF DESCRIPTION OF THE DRAWINGS 

so 

A more complete understanding of the present 
invention may be obtained from consideration of the fol- 
lowing description in conjunction with the drawings in 
which: 

55 

FIG. 1 is an overview of interconnected computer 
system networks employing tiie present invention; 
and 

FIG. 2 is a flow chart of tiie procedures of tiie 
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present invention which for tracking local access 
and local control by rewriting URL's. 

PETAILED DE SCRIPTION QF VAPIOUS |LIU$TPA - 
TIVE EMBODIMENTS 5 

Afthough the present invention is particularly well 
suited for use as a URL rewriting pseudo proxy server 
for the WWW. and shall be described with respect to 
this application, the methods and apparatus disclosed io 
here can be applied to other schemes enploying URLs 
as well as other types of resource location pointers and 
other record identifiers as links within an interconnected 
computef system network. 

The WWW allows a user to access a universe of is 
distributed iriiurrriation which combines text, audio, 
graphics and dnimat>oo within a hypermedia document 
Links are contained withm a WWW document which 
allows simple and raptd access to related documents. 
The WWW pfovdcG an access system that enables 20 
users to quicMy access al types of information with a 
common interlace, removing the necessity to execute a 
variety of numerous steps to access the information. 
The WWW stupor ts mtertaces for access to Hyp^-Text, 
UseNet news. WAIS. anonynnous FTP, Telnet and 25 
Gopher. 

The WWW has HTML based pages, which contain 
pointers to other pages. The pointers, are HyperTexl 
links which are encoded with URLs. The URL contains 
a transmission protocol, such as HTTP, a domain name 30 
of the target computer system, and a page identifier. 

The Hyper Text links are simply references to otiier 
documents, made up of two parts. The first part Is a ref- 
erence to a related item such as a document, picture, 
movie or sound. The item being referenced can be 35 
within the current document, or it can be located any- 
where on the Internet. The second part is an anchor. 
The anchor can be defined to be a word, group of 
words, a picture, or any area of the display. A reader 
activates an anchor by pointing to it and clicking with a 40 
mouse, when using a graphical browser, or by selecting 
it with the cursor (arrow) keys or tab keys, when using a 
texted based browser. Anchors can be indicated in the 
displayed document by color, graphics, reverse video, 
underline as well as other formats. 45 

When an anchor is activated, the browser fetches 
the Hem referenced by the anchor. This may involve 
reading a document from a local disk drive, or request- 
ing over the Internet that a document be sent to the local 
computer. so 

The standard way an item is ref^enced is by a 
URL. The URL contains a complete description of the 
item, which is made up of a protocol and an address. An 
absolute address reference contains the complete 
address including domain name, directory patii, and file ss 
name. A relative address reference assumes that tiie 
previous domain name and directory path are used. 

The URL is not limited to identifying WWW Hyper- 
Text files, but can also access otiier sets of data in dif- 



ferent protocols including anonymous FTP. Gopher. 
WAIS, UseNet news, and Telenet. The URL format is 
typically PJ/A, P is tiie protocol, such as HTTP (Hyper- 
Text Transfer Protocol), gopher. FTP (file transfer proto- 
col), WAIS (Wide Area Information Server), news 
(UseNet newsgroups), or Telenet. A is a valid Internet 
host address or symbolic location. 

To better understand the present invention, an 
example of an embodiment in which a newspaper con- 
sortium conrposed of individual members are intercon- 
nected through the Internet shall be used. An individual 
member may want to provide access to all of tiie con- 
sortium member organizations, but wouki only track 
tiieir local subscribers. 

Referring to FIG. 1 there is shown an overview of 
interconnected computer system networks. Each com- 
puter system network 8 and 10 contains a local compu- 
ter processor unit 12 which is coupled to a local data 
storage unit 14. The local computer processor unit 12 is 
selectively coupled to a plurality of local users 16. Each 
of the computer processor units 12 are selectively cou- 
pled to other computer processor units 12 tiirough the 
Internet 18. Local users 16 are also selectively coupled 
directiy to tiie Internet 18. 

A local newspaper, which has a computer system 
network 8, such as tiie Local Paper in Wyoming, may 
allow a local user 16 to dick into another computer sys- 
tem network 10. such as a Regional Paper in New York 
through the Internet 1 8 to access a data storage unit 14 
on tiie other computer system network 10. The Local 
Paper computer system network 8 would handle tiie bill- 
ing for tiie local user 16 and provide an authentication 
and reconciliation scheme with the Regional Paper 
computer system network 10. permitting both papers to 
profit from the venture. 

The current technology utilized over the Internet, 
specifically HTML based pages does not provide a suit- 
able means for achieving the desired scheme. The 
HTML based pages contain hyper links encoded as 
URLs, to other pages. If we assume tiiat tiie Regional 
Paper has a machine (domain) name of regional- 
paper.com for its conputer system network 10 and a 
HTML page about regional news today called regional- 
today html, a URL pointing to the regional news today at 
the Regional Paper would be 

httpy/regional-paper.com/regional-today.Wml 
which allow access to the appropriate HTTP page 
through the Internet 18. In tiiis case the URL acts as a 
remote record identifier. 

If this URL is included in an HTML page served by 
tiie computer system network 8 of tiie Local Paper 
HTTP server, the computer system network 8 of the 
Local Paper would have no way of telling if or when tiie 
local user 16 accessed the regional-today page on the 
other computer system network 10. Selecting tiie URL 
results in tiie other computer system network 10 of the 
regional-paper.com being accessed and the computer 
system network 8 of the local-paper.com is not involved 
in the access. 
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An eloquent way to achieve tracking of access and 
local control is to make all of the URL's local to the local- 
papercom machine, thus permitting the local- 
papercom machine to track and control access. Refer- 
ring to FIG. 2, there s shown a flow chart of the proce- 5 
dures of the present Invention which accomplishes 
tracking of local access and local control by rewriting the 
URLs on the fly as they pass through the local- 
paper.com machine In fc>eing served to the local user. In 
step 20, rt is determined on the fly if the actual URL is 10 
remote and if it is to be rewritten. All remote URL's may 
be selected for rewriting, or selective groups may be 
selected for rewriting. The selection may be based upon 
the remote domain name which can be compared with 
a list of remote domain names that are to be tracked as 15 
well as other comparison criteria. Thus, in step 22 the 
selected remote URL 

http://regional-papercom/regional-today.html 
would be rewritten as 

ht!p:y/local-paper.com/127.html 20 
while the text and graphics on the HTML page would 
remain the same. 

In step 24 the local system sends the HTML page 
containing the rewritten URLs to the local user. In step 
26 the local user clicks on (selects) a URL on the HTML 25 
page, thus requesting the document 127.html from the 
local-paper.com machine. In step 28 the local HTTP 
server determines if this is a rewritten URL If the URL 
is rewritten, step 30 looks up the actual URL, and in step 
32 sends the HTML page from the regional-paper.com 30 
machine. If the URL was not rewritten, step 30 is 
skipped. It is highly desirable that the rewritten URL's be 
"blind" and not easily decoded, in order that a user 
could not easily defeat the rewriting mechanism. After 
step 30 the procedure can repeat again from step 20. 35 

The local-paper.com machine by sen/ing up the 
HTML page from the regional-paper.com is acting as a 
pseudo proxy. Proxy servers are often enployed in envi- 
ronments that contain firewalls. There, the proxy acts on 
behalf of the user through the firewall, directing all 40 
HTTP access through It is not desirable to supply proxy 
service to every user. Many users access the Internet 
through a corporate firewall. It is desirable to leave the 
user's environment(s) unchanged. The URL rewrite 
scheme does this by being completely transparent to 45 
the end user. URLs that are not rewritten, which are 
links that we do not want to track, are not rewritten and 
behave as usual. 

The proxy server in the rewrite scheme is a pseudo 
proxy or domain specific proxy, in that the server only so 
acts as a proxy for the HTML pages that it is hosting and 
the pages that it is pointing to. Typically, proxies have all 
or no requests sent through. them. In the present inven- 
tion, with the pseudo proxy, only the requests in its 
domain are served. The conversion of the original ss 
remote URL to a local/pseudo proxy based URL can be 
implemented efficiently. The rewriting of the URLs is a 
remapping of selected record identifies from one 
domain to another domain (between a local and a 



remote domain). 

First the URL is recognized as a remote URL. which 
is shown as step 20. This can be accomplished by scan- 
ning the domain name part of the URL If the domain 
name is remote when compared to a local domain name 
or when conrtpared to a predetermined table of domain 
names that are to be tracked, the remote URL is 
replaced by an opaque local URL which is shown in 
step 22. An opaque URL is one that the user can not 
easily generate or reconstruct the remote URL from, as 
this would subvert the process. This can be accom- 
plished by using indices that are private to the HTTP 
server. The generation of the Indices can be accom- 
plished from a local register, an incremented Integer, or 
memory address from where the string is stored in a 
database, the inode of a disk file, or a simple disk file 
name. 

The conversion of the proxy URL can be done by 
using indices. The number is an index into an an^ay 
where the actual remote URL is stored, utilizing a mini- 
mat perfect hash. Hashing is a technique for arranging a 
set of items, in which a hash function is applied to the 
key of each item to determine its hash value. The hash 
value identifies each item's primary position in a hash 
table, and if this position is already occupied, the item is 
inserted either into an overflow table or in another avail- 
able position in the table. 

The indices also provide a simple way of tracking 
access to the remote U RLs. with the level of detail track- 
ing limited only by the level of detail that is recorded. 
Further, the indices can be utilized to determined if 
access to the remote URL is to be granted or denied 
and may depend upon the particular status or identity of 
a local user. 

An alternative scheme is when the name is a 
number of a memory address or a key stored in a data- 
base. Another alternative scheme is to utilize the disk 
inode which requires that the inode be looked up in the 
disk node table. When a disk file name is used, the file 
Is opened which can contain the remote URL. 

Numerous modifications and alternative embodi- 
ments of the invention will be apparent to those skilled 
in tiie art in view of the foregoing description. Accord- 
ingly, this description is to be consti-ued as illustrative 
only and is for the purpose of teaching those skilled in 
tiie art the best mode of canrying out the invention. 
Details of the structure may be varied substantially with- 
out departing from the spirit of the invention and tiie 
exclusive use of all modifications which come witiiin tiie 
scope of tiie appended daim is reserved. 

Claims 

1. In an interconnected computer system network a 
method for real time remapping a remote domain to 
a local domain, said metiiod comprising the steps 
of: 

defining a pseudo proxy server; and 
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translating in said psuedo proxy server a 
remote record identifier corresponding to said 
remote domain to a remapped record identifier 
con^esponding to said local domain. 

2. The method as recited in daim 1 comprising the 
additional step of determining if a selected record 
identifier is a selected remapped record identifier. 

3. 



4. 



5. 



6. 



7. The method as recited in claim 1 wherein the step so 
of translating a remote record identifier comprises 
looking up said remote record identifier by a prede- 
termined index. 

8. The method as recited in claim 7 wherein the step 35 
of translating a renrrate record identifier further com- 
prises a hashing table. 

9. The method as recited in claim 8 wherein said 
hashing table comprises a minimal perfect hash. 4o 

10. The method as recited in claim 7 wherein the step 
of translating a remote record identifier furtiier com- 
prises addressing a memory location. 

45 

11. The method as redted in claim 7 wherein the step 
of trar^lating a remote record identifier furtiier com- 
prises accessing an inode of a disk file. 

12. The method as recited in claim 1 wherein the step so 
of translating a remote record identifier conprises 
accessing a disk file by a file name. 

13. The method as recited in claim 3 wherein said data 

set comprises a hypertext transfer protocol request. S5 

14. In an Interconnected computer system network a 
method of tracking and conti'olling access to remote 
record identifiers, said metiiod comprising the steps ' 



of: 

providing a first data set having a rewritten 
record identifier for a remote record identifier to 
a local user; 

r^ponding to a request from said local user for 
a selected record identifier; 
determining if said selected record identifier is 
a rewritten record identifier; 
determining an actual record identifier for said 
rewritten record identifier; and 
requesting a second data set corresponding to 
said actual record idemifier from said intercon- 
nected computer system network. 

15. The metiiod as recited in claim 1 4 wherein said first 
data set comprises a hypertext markup language 
based data set. 

16. The method as recited in claim 14 wherein said 
remote record identifier comprises a uniform record 
locator. 

17. The method as recited in daim 14 comprising tiie 
additional steps of determining that a record identi- 
fier is remote and rewriting said remote record iden- 
tifier. 

18. The method as redted in claim 17 wherein the 
steps of detemnining that a record identifier is 
remote comprises scanning a domain name of said 
actual record identifier and comparing said domain 
name to a local domain name wherein said record 
tderttif ier Is remote if said domain name is different 
tiian said local domain name. 

19. TTie method as recited in daim 14 wherein the step 
of determining an actual record identifier for said 
rewritten record identifier comprises looking up said 
actual record identifier by a predetermined index. 

20. The method as recited in daim 1 9 wherein the step 
of determining an actual record identifier further 
conprises a hashing table. 

21. The method as recited in claim 20 wherein said 
hashing table comprises a minimal perfect hash. 

22. The method as recited in daim 19 wherein the step 
of determining an actual record identifier further 
comprises addressing a memory location. 

23. The method as recited in claim 19 wherein the step 
of determining an actual record identifier further 
comprises accessing an inode of a disk file. 

24. The method as recited in daim 14 wherein the step 
of determining an actual record identifier for said 
rewritten record identifier comprises accessing a 



The method as recited in daim 2 conprising the io 
additional st^ of requesting a data set correspond- 
ing to a selected remote record identifier corre- 
sponding to said selected remapped record 
identifier. 

75 

The method as recited in daim 1 conprising the 
additional step of tracking access of a local user to 
said remote domain tiirough said pseudo proxy 
server. 

20 

The method as recited in daim 1 conprising the 
additional step of restricting access of a local user 
to said remote domain through said pseudo proxy 
server. 

25 

The method as redted in claim 1 wherein said 
remote record identifier comprises a uniform record 
locator 
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disk file by a file name. 

25. The method as recited in claim 24 wherein said disk 
file contains a domain name of said actual record 
identifier. 

26. The method as recited in claim 24 wherein said disk 
file contains said actual record identifier. 

27. The method as recited in claim 14 wherein said 
request from said local user comprises a hypertext 
transfer protocol request. 

28. In an interconnected computer system network a 
method of provKing pseudo proxy access for track- 
ing and cornroding access to remote uniform record 
locators, said method comprising the steps of: 



of determining an actual uniform record locator fur- 
ther comprises accessing an inode of a disk file. 

35. The method as recited in daim 28 wherein the step 
5 of determining an actual uniform record locator for 

said rewritten uniform record locator comprises 
accessing a disk file by a file name. 

36. The method as recited in claim 28 comprising the 
10 additional st^s of determining that a uniform 

record locator is remote by comparing a domain 
name of said uniform record locator to conrpared to 
a predetermined table of domain names and rewrit- 
ing said remote uniform record locator. 

75 



providing a hypertext markup language based 
page havtng a reM-itten uniform record locator qo 
for a remote unrform record locator to a local 
user. 

responc*ng to a request from said local user for 
a selected un»tcf m record locator; 
determinrig if sad selected uniform record 25 
locator IS a rewritten uniform record locator; 
determinng an actual uniform record locator 
for said rewritten uniform record locator; and 
requesting a second data set corresponding to 
said actual uniform record locator from said 30 
interconnected computer system network. 

29. The method as reated in claim 28 comprising the 
additional steps of determining that a uniform 
record locator is remote by comparing a domain 35 
name of said uniform record locator to a local 
domain name, wherein said uniform record locator 
is remote if said domain name is different than said 
local domain name and rewriting said remote uni- 
form record locator. 4o 



30. The method as recited in claim 28 wherein the step 
of determining an actual uniform record locator for 
said rewritten uniform record locator comprises 
looking up said actual uniform record locator by a 4S 
predetermined index. 

31. The method as recited in claim 30 wherein the step 
of determining an actual uniform record locator fur- 
ther comprises a hashing table. so 

32. The method as recited in daim 31 wherein said 
hashing table comprises a minimal perfect hash. 

33. The method as recited in claim 28 wherein the step S5 
of determining an actual uniform record locator fur- 
ther comprises addressing a memory location. 

34. The method as recited in claim 28 wherein the step 
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